67 research outputs found

    On the efficient parallel computation of Legendre transforms

    Get PDF
    In this article, we discuss a parallel implementation of efficient algorithms for computation of Legendre polynomial transforms and other orthogonal polynomial transforms. We develop an approach to the Driscoll-Healy algorithm using polynomial arithmetic and present experimental results on the accuracy, efficiency, and scalability of our implementation. The algorithms were implemented in ANSI C using the BSPlib communications library. We also present a new algorithm for computing the cosine transform of two vectors at the same time

    Partitioning 3D space for parallel many-particle stimulations

    Get PDF
    In a common approach for parallel processing applied to simulations of manyparticle systems with short-ranged interactions and uniform density, the simulation cell is partitioned into domains of equal shape and size, each of which is assigned to one processor. We compare the commonly used simple-cubic (SC) domain shape to domain shapes chosen as the Voronoi cells of BCC and FCC lattices. The latter two are found to result in superior partitionings with respect to communication overhead. Other domain shapes, relevant for a small number of processors, are also discussed. The higher eciency with BCC and FCC partitionings is demonstrated in simulations of the sillium model for amorphous silicon

    A medium-grain method for fast 2D bipartitioning of sparse matrices

    Get PDF
    We present a new hypergraph-based method, the medium-grain method, for solving the sparse matrix partitioning problem. This problem arises when distributing data for parallel sparse matrix-vector multiplication. In the medium-grain method, each matrix nonzero is assigned to either a row group or a column group, and these groups are represented by vertices of the hypergraph. For an m x n sparse matrix, the resulting hypergraph has m + n vertices and m + n hyperedges. Furthermore, we present an iterative refinement procedure for improvement of a given partitioning, based on the medium-grain method, which can be applied as a cheap but effective postprocessing step after any partitioning method. The medium-grain method is able to produce fully two-dimensional bipartitionings, but its computational complexity equals that of one-dimensional methods. Experimental results for a large set of sparse test matrices show that the medium-grain method with iterative refinement produces bipartitionings with lower communication volume compared to current state-of-the-art methods, and is faster at producing them

    Parallel Fast Legendre Transform

    Get PDF
    We discuss a parallel implementation of a fast algorithm for the discrete polynomial Legendre transform We give an introduction to the DriscollHealy algorithm using polynomial arithmetic and present experimental results on the eciency and accuracy of our implementation The algorithms were implemented in ANSI C using the BSPlib communications library Furthermore we present a new algorithm for computing the Chebyshev transform of two vectors at the same tim

    A geometric partitioning method for distributed tomographic reconstruction

    Get PDF
    Tomography is a powerful technique for 3D imaging of the interior of an object. With the growing sizes of typical tomographic data sets, the computational requirements for algorithms in tomography are rapidly increasing. Parallel and distributed-memory methods for tomographic reconstruction are therefore becoming increasingly common. An underexposed aspect is the effect of the data distribution on the performance of distributed-memory reconstruction algorithms. In this work, we introduce a geometric partitioning method, which takes into account the acquisition geometry and aims to minimize the necessary communication between nodes for distributed-memory forward projection and back projection operations. These operations are crucial subroutines for an important class of reconstruction methods. We show that the choice of data distribution has a significant impact on the runtime of these methods. With our novel partitioning method we reduce the communication volume drastically compared to straightforward distributions, by up to 90% for a number of cases, and furthermore we guarantee a specified load balance

    DNA electrophoresis studied with the cage model

    Get PDF
    The cage model for polymer reptation, proposed by Evans and Edwards, and its recent extension to model DNA electrophoresis, are studied by numerically exact computation of the drift velocities for polymers with a length L of up to 15 monomers. The computations show the Nernst-Einstein regime (v ~ E) followed by a regime where the velocity decreases exponentially with the applied electric field strength. In agreement with de Gennes' reptation arguments, we find that asymptotically for large polymers the diffusion coefficient D decreases quadratically with polymer length; for the cage model, the proportionality coefficient is DL^2=0.175(2). Additionally we find that the leading correction term for finite polymer lengths scales as N^{-1/2}, where N=L-1 is the number of bonds.Comment: LaTeX (cjour.cls), 15 pages, 6 figures, added correctness proof of kink representation approac

    Bajcsy-Zsilinkszky Endre fogságban és az ellenállás élén (1944) = Endre Bajcsy-Zsilinszky in Captivity and as Leader of the Resistance (1944)

    Get PDF
    We present a new parallel radix-4 FFT algorithm based on the BSP model. Our parallel algorithm uses the group-cyclic distribution family, which makes it simple to understand and easy to implement. We show how to reduce the communication cost of the algorithm by a factor of 3, in the case that the input/output vector is in the cyclic distribution. We also show how to reduce computation time on computers with a cache-based architecture. We present performance results on a Cray T3E with up to 64 processors, obtaining reasonable efficiency levels for local problem sizes as small as 256 and very good efficiency levels for local sizes larger than 2048

    № 201. Ордер на обшук та арешт Харитона Гов’ядовського від 21лютого 1938 р.

    Get PDF
    Op de studiegroep Wiskunde met de Industrie in 2005 in Amsterdam was de medische wetenschap prominent aanwezig. Het woord ‘industrie’ in de naam van de workshop moet dan ook al sinds vele jaren ruim worden opgevat. De studiegroep is daarmee tevens een test hoe bruikbaar wiskunde eigenlijk is voor de samenleving. Hoe kunnen wiskundige disciplines commercieel worden ingezet, wat hebben we aan de nieuwste ontwikkelingen in de statistiek, wat is de betekenis van de steeds maar groeiende kennis van het modelleren met differentiaalvergelijkingen? Soms valt dit tegen, maar wiskundige common sense geeft de bedrijven ook dikwijls een andere blik op hun probleem. En de wiskundigen kunnen met hun neus bovenop een open-hartoperatie komen te staan

    Increasing Detection Performance of Surveillance Sensor Networks

    Get PDF
    We study a surveillance wireless sensor network (SWSN) comprised of small and low-cost sensors deployed in a region in order to detect objects crossing the field of interest. In the present paper, we address two problems concerning the design and performance of an SWSN: optimal sensor placement and algorithms for object detection in the presence of false alarms. For both problems, we propose explicit decision rules and efficient algorithmic solutions. Further, we provide several numerical examples and present a simulation model that combines our placement and detection methods

    BSP Functional Programming: Examples of a Cost Based Methodology

    Full text link
    Abstract. Bulk-Synchronous Parallel ML (BSML) is a functional data-parallel language for the implementation of Bulk-Synchronous Parallel (BSP) algorithms. It makes an estimation of the execution time (cost) possible. This paper presents some general examples of BSML programs and a comparison of their predicted costs with the measured execution time on a parallel machine
    corecore